Integrated Analysis of Gene Expression and Copy Number Data using Sparse Representation Based Clustering Model
نویسندگان
چکیده
Among biological measurements, DNA microarray gene expression and array comparative genomic hybridization (aCGH) have been widely used. Due to the vast information of the biological data, various clustering techniques have been developed to identify subsets of genes with specific gene expression patterns and large variations across samples. Since integrated analysis of genomic data from different sources can further increase the reliability of biological analysis results, methods of integrating and analyzing different types of genomic measurements have emerged. In this work, we jointly examine gene expression and copy number data and iteratively project the data on different clusters through the sparse representation based clustering (SRC) model. Our method has been tested on a breast cancer cell lines data set and a breast tumors data set. In addition, simulated data sets were used to test the robustness of the method to noise. Experiments showed that our proposed method can effectively identify genes with large variations in gene expression and copy number, and locate genes that are statistically significant in both measurements. The proposed method can be applicable to a wide variety of biological problems where joint analysis of biological measurements is a common challenge.
منابع مشابه
BICoB(2011): 172-177 INTEGRATED ANALYSIS OF GENE EXPRESSION AND COPY NUMBER DATA USING SPARSE REPRESENTATION BASED CLUSTERING MODEL
Among biological measurements, DNA microarray gene expression and array comparative genomic hybridization (aCGH) have been widely used. Due to the vast information of the biological data, various clustering techniques have been developed to identify subsets of genes with specific gene expression patterns and large variations across samples. Since integrated analysis of genomic data from differe...
متن کاملSparse Representation Based Clustering for Integrated Analysis of Gene Copy Number Variation and Gene Expression Date
Integrated analysis of multiple types of genomic data has received increasing attention in recently years, due to the rapid development of new genetic techniques and the strong demand for the improvement of the reliability of these techniques. In this work, we proposed a sparse representation based clustering (SRC) method for joint analysis of gene expression and copy number data with the purpo...
متن کاملGene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011